44 research outputs found
Data transfer scheduling with advance reservation and provisioning
Over the years, scientific applications have become more complex and more data intensive. Although through the use of distributed resources the institutions and organizations gain access to the resources needed for their large-scale applications, complex middleware is required to orchestrate the use of these storage and network resources between collaborating parties, and to manage the end-to-end processing of data. We present a new data scheduling paradigm with advance reservation and provisioning. Our methodology provides a basis for provisioning end-to-end high performance data transfers which require integration between system, storage and network resources, and coordination between reservation managers and data transfer nodes. This allows researchers/users and higher level meta-schedulers to use data placement as a service where they can plan ahead and reserve time and resources for their data movement operations. We present a novel approach for evaluating time-dependent structures with bandwidth guaranteed paths. We present a practical online scheduling model using advance reservation in dynamic network with time constraints. In addition, we report a new polynomial algorithm presenting possible reservation options and alternatives for earliest completion and shortest transfer duration. We enhance the advance network reservation system by extending the underlying mechanism to provide a new service in which users submit their constraints and the system suggests possible reservation requests satisfying users\u27 requirements. We have studied scheduling data transfer operation with resource and time conflicts. We have developed a new scheduling methodology considering resource allocation in client sites and bandwidth allocation on network link connecting resources. Some other major contributions of our study include enhanced reliability, adaptability, and performance optimization of distributed data placement tasks. While designing this new data scheduling architecture, we also developed other important methodologies such as early error detection, failure awareness, job aggregation, and dynamic adaptation of distributed data placement tasks. The adaptive tuning includes dynamically setting data transfer parameters and controlling utilization of available network capacity. Our research aims to provide a middleware to improve the data bottleneck in high performance computing systems
Failure-awareness and dynamic adaptation in data scheduling
Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance
Recommended from our members
An Online Scheduling Algorithm with Advance Reservation for Large-Scale Data Transfers
Scientific applications and experimental facilities generate massive data sets that need to be transferred to remote collaborating sites for sharing, processing, and long term storage. In order to support increasingly data-intensive science, next generation research networks have been deployed to provide high-speed on-demand data access between collaborating institutions. In this paper, we present a practical model for online data scheduling in which data movement operations are scheduled in advance for end-to-end high performance transfers. In our model, data scheduler interacts with reservation managers and data transfer nodes in order to reserve available bandwidth to guarantee completion of jobs that are accepted and confirmed to satisfy preferred time constraint given by the user. Our methodology improves current systems by allowing researchers and higher level meta-schedulers to use data placement as a service where theycan plan ahead and reserve the scheduler time in advance for their data movement operations. We have implemented our algorithm and examined possible techniques for incorporation into current reservation frameworks. Performance measurements confirm that the proposed algorithm is efficient and scalable
Recommended from our members
Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks
Advance Resource Provisioning in Bulk Data Scheduling ∗
Today’s scientific and business applications generate massive data sets that need to be transferred to remote sites for sharing, processing, and long term storage. Because of increasing data volumes and enhancement in current network technology that provide ondemand high-speed data access between collaborating institutions, data handling and scheduling problems have reached a new scale. In this paper, we present a new data scheduling model with advance resource provisioning, in which data movement operations are defined with earliest start and latest completion times. We analyze timedependent resource assignment problem, and propose a new methodology to improve the current systems by allowing researchers and higher-level meta-schedulers to use data-placement as-a-service, so they can plan ahead and submit transfer requests in advance. In general, scheduling with time and resource conflicts is NP-hard. We introduce an efficient algorithm to organize multiple requests on the fly, while satisfying users ’ time and resource constraints. We successfully tested our algorithm in a simple benchmark simulator that we have developed, and demonstrated its performance with initial test results
Recommended from our members
Open Problems in Network-aware Data Management in Exa-scale Computing and Terabit Networking Era
Accessing and managing large amounts of data is a great challenge in collaborative computing environments where resources and users are geographically distributed. Recent advances in network technology led to next-generation high-performance networks, allowing high-bandwidth connectivity. Efficient use of the network infrastructure is necessary in order to address the increasing data and compute requirements of large-scale applications. We discuss several open problems, evaluate emerging trends, and articulate our perspectives in network-aware data management
Recommended from our members
Error Detection and Error Classification: Failure Awareness in Data Transfer Scheduling
Data transfer in distributed environment is prone to frequent failures resulting from back-end system level problems, like connectivity failure which is technically untraceable by users. Error messages are not logged efficiently, and sometimes are not relevant/useful from users point-of-view. Our study explores the possibility of an efficient error detection and reporting system for such environments. Prior knowledge about the environment and awareness of the actual reason behind a failure would enable higher level planners to make better and accurate decisions. It is necessary to have well defined error detection and error reporting methods to increase the usability and serviceability of existing data transfer protocols and data management systems. We investigate the applicability of early error detection and error classification techniques and propose an error reporting framework and a failure-aware data transfer life cycle to improve arrangement of data transfer operations and to enhance decision making of data transfer schedulers